In [5]:
import pandas as pd
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import seaborn as sns

plt.style.use('fivethirtyeight')

%matplotlib inline
%config InlineBackend.figure_format = 'retina'

In [7]:
# Import the excel file and call it xls_file 
xls_file = pd.ExcelFile('globalterrorismdb_0616dist.xlsx')
xls_file


Out[7]:
<pandas.io.excel.ExcelFile at 0x11bc1ef10>

In [5]:
# Load the xls file's Sheet1 as a dataframe
dfwhole = xls_file.parse('Data')
dfwhole


Out[5]:
eventid iyear imonth iday approxdate extended resolution country country_txt region ... addnotes scite1 scite2 scite3 dbsource INT_LOG INT_IDEO INT_MISC INT_ANY related
0 197000000001 1970 0 0 NaN 0 NaN 58 Dominican Republic 2 ... NaN NaN NaN NaN PGIS 0 0 0 0 NaN
1 197000000002 1970 0 0 NaN 0 NaN 130 Mexico 1 ... NaN NaN NaN NaN PGIS 0 1 1 1 NaN
2 197001000001 1970 1 0 NaN 0 NaN 160 Philippines 5 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
3 197001000002 1970 1 0 NaN 0 NaN 78 Greece 8 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
4 197001000003 1970 1 0 NaN 0 NaN 101 Japan 4 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
5 197001010002 1970 1 1 NaN 0 NaN 217 United States 1 ... The Cairo Chief of Police, William Petersen, r... "Police Chief Quits," Washington Post, January... "Cairo Police Chief Quits; Decries Local 'Mili... Christopher Hewitt, "Political Violence and Te... Hewitt Project -9 -9 0 -9 NaN
6 197001020001 1970 1 2 NaN 0 NaN 218 Uruguay 3 ... NaN NaN NaN NaN PGIS 0 0 0 0 NaN
7 197001020002 1970 1 2 NaN 0 NaN 217 United States 1 ... Damages were estimated to be between $20,000-$... Committee on Government Operations United Stat... Christopher Hewitt, "Political Violence and Te... NaN Hewitt Project -9 -9 0 -9 NaN
8 197001020003 1970 1 2 NaN 0 NaN 217 United States 1 ... The New Years Gang issue a communiqué to a loc... Tom Bates, "Rads: The 1970 Bombing of the Army... David Newman, Sandra Sutherland, and Jon Stewa... The Wisconsin Cartographers' Guild, "Wisconsin... Hewitt Project 0 0 0 0 NaN
9 197001030001 1970 1 3 NaN 0 NaN 217 United States 1 ... Karl Armstrong's girlfriend, Lynn Schultz, dro... Committee on Government Operations United Stat... Tom Bates, "Rads: The 1970 Bombing of the Army... David Newman, Sandra Sutherland, and Jon Stewa... Hewitt Project 0 0 0 0 NaN
10 197001050001 1970 1 1 NaN 0 NaN 217 United States 1 ... NaN NaN NaN NaN PGIS 0 0 0 0 NaN
11 197001060001 1970 1 6 NaN 0 NaN 217 United States 1 ... NaN Committee on Government Operations United Stat... Christopher Hewitt, "Political Violence and Te... NaN Hewitt Project -9 -9 0 -9 NaN
12 197001080001 1970 1 8 NaN 0 NaN 98 Italy 8 ... NaN NaN NaN NaN Hijacking DB -9 -9 1 1 NaN
13 197001090001 1970 1 9 NaN 0 NaN 217 United States 1 ... NaN Committee on Government Operations United Stat... Christopher Hewitt, "Political Violence and Te... NaN Hewitt Project -9 -9 0 -9 NaN
14 197001090002 1970 1 9 NaN 0 NaN 217 United States 1 ... The fire began at 8:30 PM. The Armed Commandos... Committee on the Judiciary United States Sena... "No Evidence Of Arson Found In Barkers Fire," ... "Toward People's War for Independence and Soci... Hewitt Project 0 0 0 0 NaN
15 197001100001 1970 1 10 NaN 0 NaN 499 East Germany (GDR) 9 ... NaN NaN NaN NaN PGIS 0 1 1 1 NaN
16 197001110001 1970 1 11 NaN 0 NaN 65 Ethiopia 11 ... NaN NaN NaN NaN PGIS 0 1 1 1 NaN
17 197001120001 1970 1 12 NaN 0 NaN 217 United States 1 ... One half hour after the bomb explosion, an ano... "Blast Damages Flatbush School," New York Time... Linda Greenhouse, "Madison School Puzzled By B... Committee on Government Operations United Stat... Hewitt Project -9 -9 0 -9 NaN
18 197001120002 1970 1 12 NaN 0 NaN 217 United States 1 ... NaN Committee on the Judiciary United States Sena... "Toward People's War for Independence and Soci... NaN Hewitt Project -9 -9 0 -9 NaN
19 197001130001 1970 1 13 NaN 0 NaN 217 United States 1 ... The store was a White owned business operating... Committee on Government Operations United Stat... NaN NaN Hewitt Project -9 -9 0 -9 NaN
20 197001140001 1970 1 14 NaN 0 NaN 217 United States 1 ... NaN Committee on Government Operations United Stat... Christopher Hewitt, "Political Violence and Te... Peter F. Nardulli and Jeffrey M. Stonecash, "P... Hewitt Project -9 -9 0 -9 NaN
21 197001150001 1970 1 15 NaN 0 NaN 218 Uruguay 3 ... NaN NaN NaN NaN PGIS 0 0 0 0 NaN
22 197001190002 1970 1 19 NaN 0 NaN 217 United States 1 ... Witnesses observed three African American male... Committee on Government Operations United Stat... Christopher Hewitt, "Political Violence and Te... Seattle University, "1965-1975: Troubled Times... Hewitt Project -9 -9 0 -9 NaN
23 197001190003 1970 1 19 NaN 0 NaN 217 United States 1 ... Judith and Silas Bissell were both members of ... Committee on Government Operations United Stat... Christopher Hewitt, "Political Violence and Te... Earl Caldwell, "Fear Grows In Seattle As Polic... Hewitt Project -9 -9 0 -9 NaN
24 197001190004 1970 1 19 January 19-20, 1970 0 NaN 217 United States 1 ... The building might have been shot at a second ... Committee on Government Operations United Stat... "Black Panthers Say Office Was Bombed," New Yo... "30 Shots Fired Into Office of Panthers in Jer... Hewitt Project -9 -9 0 -9 NaN
25 197001200001 1970 1 20 NaN 0 NaN 83 Guatemala 2 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
26 197001210001 1970 1 21 NaN 0 NaN 160 Philippines 5 ... NaN NaN NaN NaN PGIS -9 -9 1 1 NaN
27 197001220001 1970 1 22 NaN 0 NaN 222 Venezuela 3 ... NaN NaN NaN NaN PGIS 0 0 0 0 NaN
28 197001220002 1970 1 22 NaN 0 NaN 217 United States 1 ... This attack might be linked with other episode... Committee on Government Operations United Stat... "Beef Plant Workers in Nebraska Await Call to ... NaN Hewitt Project -9 -9 0 -9 NaN
29 197001250001 1970 1 25 NaN 0 NaN 217 United States 1 ... Police, at the time suspected, that this attac... "Miss. City Is Desegregation Trouble Spot," Ch... NaN NaN Hewitt Project -9 -9 0 -9 NaN
... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ... ...
156742 201512310002 2015 12 31 NaN 0 NaN 95 Iraq 10 ... NaN "Iraq: Roundup of Security Incidents 29 Decemb... NaN NaN START Primary Collection -9 -9 0 -9 NaN
156743 201512310004 2015 12 31 NaN 0 NaN 95 Iraq 10 ... NaN "Iraq: Roundup of Security Incidents 29 Decemb... NaN NaN START Primary Collection -9 -9 0 -9 NaN
156744 201512310005 2015 12 31 NaN 0 NaN 95 Iraq 10 ... NaN "8 people killed, wounded in bomb south west o... "Iraq: Roundup of Security Incidents 29 Decemb... "Terrorism: Transcript of ISIL's Al-Bayan Radi... START Primary Collection -9 -9 0 -9 NaN
156745 201512310006 2015 12 31 NaN 0 NaN 95 Iraq 10 ... NaN "7 people killed, wounded in bomb blast east o... "Iraq: Security Roundup 1900 GMT 31 December 2... NaN START Primary Collection -9 -9 0 -9 NaN
156746 201512310008 2015 12 31 NaN 1 NaN 228 Yemen 10 ... NaN "Aden army commander survives Yemen bombing," ... "Yemen: Security Roundup 2000 GMT 31 December ... "Gunmen kill senior militia leader in Yemen's ... START Primary Collection -9 -9 0 -9 NaN
156747 201512310009 2015 12 31 NaN 0 NaN 228 Yemen 10 ... NaN "Aden army commander survives Yemen bombing," ... "Senior south Yemen army chief survives bombin... "Senior south Yemen army commander survives bo... START Primary Collection -9 -9 0 -9 NaN
156748 201512310010 2015 12 31 NaN 0 NaN 228 Yemen 10 ... NaN "Yemen: Security Roundup 2000 GMT 31 December ... NaN NaN START Primary Collection 0 1 0 1 NaN
156749 201512310012 2015 12 31 NaN 0 NaN 228 Yemen 10 ... NaN "Yemen: Security Roundup 2000 GMT 31 December ... NaN NaN START Primary Collection 0 0 0 0 NaN
156750 201512310013 2015 12 31 NaN 0 NaN 173 Saudi Arabia 10 ... NaN "Saudi-led coalition ending Yemen ceasefire, s... "Yemen: Security Roundup 2000 GMT 31 December ... NaN START Primary Collection 1 1 0 1 NaN
156751 201512310015 2015 12 31 NaN 0 NaN 60 Egypt 10 ... There is doubt that this incident meets terror... "Egypt parents, 3 children killed by shelling ... "Egypt parents, three children killed by shell... "Egypt officials: Militants in Sinai kill enti... START Primary Collection -9 -9 0 -9 NaN
156752 201512310016 2015 12 31 NaN 0 NaN 173 Saudi Arabia 10 ... NaN "Missile from Yemen kills 3 Saudi civilians," ... NaN NaN START Primary Collection -9 -9 0 -9 NaN
156753 201512310017 2015 12 31 NaN 0 NaN 155 West Bank and Gaza Strip 10 ... NaN "IDF soldier lightly wounded in West Bank ramm... "Palestinian in West Bank rams Israeli forces ... "Breaking: Soldier lightly wounded in vehicula... START Primary Collection 0 0 0 0 NaN
156754 201512310018 2015 12 31 NaN 0 NaN 209 Turkey 10 ... NaN "Anadolu Agency Reports One PKK Member Killed,... NaN NaN START Primary Collection 0 0 0 0 NaN
156755 201512310019 2015 12 31 NaN 0 NaN 214 Ukraine 9 ... NaN "Ukrainian Police Blame Second Crimean Power O... "Ukrainian police say Crimean power line damag... "Power transmission line tower in Kherson regi... START Primary Collection -9 -9 0 -9 NaN
156756 201512310020 2015 12 31 NaN 0 NaN 92 India 6 ... NaN "Grenade hurled at Police Station Khanyar," Ka... NaN NaN START Primary Collection -9 -9 0 -9 NaN
156757 201512310021 2015 12 31 NaN 0 NaN 75 Germany 8 ... NaN "1ST LEAD," Deutsche Presse-Agentur, January 1... NaN NaN START Primary Collection -9 -9 1 1 NaN
156758 201512310022 2015 12 31 NaN 0 NaN 65 Ethiopia 11 ... Casualty numbers for this incident conflict ac... "Students said leaving southern Ethiopian univ... "Hand grenade attack kills 2 at Ethiopian univ... "Hand grenade attack kills two at Ethiopian un... START Primary Collection -9 -9 0 -9 NaN
156759 201512310024 2015 12 31 NaN 0 NaN 160 Philippines 5 ... NaN "Philippines: BIFF launches simultaneous attac... "BIFF harassments, rido disrupt New Year celeb... "HeadlinesTroops beat back BIFF assaults on 3 ... START Primary Collection 0 0 0 0 201512310024, 201512310026, 201512310027
156760 201512310025 2015 12 31 NaN 0 NaN 4 Afghanistan 6 ... NaN "Afghanistan: Unknown Gunmen Murder Girls' Sch... "Highlights: Pakistan Pashto Press 2 January, ... NaN START Primary Collection 0 0 0 0 NaN
156761 201512310026 2015 12 31 NaN 0 NaN 160 Philippines 5 ... NaN "Philippines: BIFF launches simultaneous attac... "BIFF harassments, rido disrupt New Year celeb... "HeadlinesTroops beat back BIFF assaults on 3 ... START Primary Collection 0 0 0 0 201512310024, 201512310026, 201512310027
156762 201512310027 2015 12 31 NaN 0 NaN 160 Philippines 5 ... Casualty numbers for this incident conflict ac... "26 BIFF rebels slain in attacks," Manila Time... "Philippines: BIFF launches simultaneous attac... "BIFF harassments, rido disrupt New Year celeb... START Primary Collection 0 0 0 0 201512310024, 201512310026, 201512310027
156763 201512310028 2015 12 31 2015-12-31 00:00:00 0 NaN 95 Iraq 10 ... NaN "Bomb blast kills soldier, wounds another in s... "2 soliders killed, wounded in explosion in so... NaN START Primary Collection -9 -9 0 -9 NaN
156764 201512310029 2015 12 31 NaN 0 NaN 153 Pakistan 6 ... NaN "Unknown armed men fire 3 rockets in Mand and ... "Highlights: Pakistan Balochistan Press 1 Janu... NaN START Primary Collection -9 -9 0 -9 NaN
156765 201512310030 2015 12 31 NaN 0 NaN 153 Pakistan 6 ... NaN "Unknown armed men fire 3 rockets in Mand and ... "Highlights: Pakistan Balochistan Press 1 Janu... NaN START Primary Collection 0 0 0 0 NaN
156766 201512310031 2015 12 31 NaN 0 NaN 153 Pakistan 6 ... NaN "Unknown armed men fire 3 rockets in Mand and ... "Highlights: Pakistan Balochistan Press 1 Janu... NaN START Primary Collection 0 0 0 0 NaN
156767 201512310032 2015 12 31 NaN 0 NaN 34 Burundi 11 ... Casualty numbers for this incident conflict ac... "1 dead, dozens injured in Burundi grenade att... "One dead, dozen hurt in Burundi grenade attac... NaN START Primary Collection -9 -9 0 -9 NaN
156768 201512310033 2015 12 31 2015-12-31 00:00:00 0 NaN 96 Ireland 8 ... NaN "Gardaí investigate Knocknaheeny pipe bomb," C... NaN NaN START Primary Collection -9 -9 0 -9 NaN
156769 201512310034 2015 12 31 NaN 0 NaN 160 Philippines 5 ... NaN "Village official killed, 2 injured in Marawi,... NaN NaN START Primary Collection -9 -9 0 -9 NaN
156770 201512310036 2015 12 31 2015-12-31 00:00:00 0 NaN 182 Somalia 11 ... NaN "Africa Command OSINT Daily 31 December 2015,"... NaN NaN START Primary Collection 0 0 0 0 NaN
156771 201512310037 2015 12 31 NaN 1 2016-01-16 00:00:00 113 Libya 10 ... Casualty numbers for this attack conflict acro... "20 Egyptian captives return from Libya, one r... "21 abducted nationals released in Libya," Cai... "Libya Daily Digest January 20, 2016," Libya D... START Primary Collection 0 1 1 1 NaN

156772 rows × 137 columns


In [6]:
## Now trying to make the df a csv file
dfwhole.to_csv('gtd1970_2015.csv', encoding = 'utf8')

In [7]:
len(dfwhole)


Out[7]:
156772

In [8]:
#Checking column counts

dfwhole.count()


Out[8]:
eventid               156772
iyear                 156772
imonth                156772
iday                  156772
approxdate              4756
extended              156772
resolution              3502
country               156772
country_txt           156772
region                156772
region_txt            156772
provstate             142252
city                  156326
latitude              152253
longitude             152253
specificity           156772
vicinity              156772
location               42211
summary                90632
crit1                 156772
crit2                 156772
crit3                 156772
doubtterr             156771
alternative            24236
alternative_txt       156772
multiple              156772
success               156772
suicide               156772
attacktype1           156772
attacktype1_txt       156772
                       ...  
propextent             56352
propextent_txt        156772
propvalue              31312
propcomment            49422
ishostkid             156594
nhostkid               11268
nhostkidus             11213
nhours                  3302
ndays                   6582
divert                   289
kidhijcountry           3290
ransom                 75092
ransomamt               1195
ransomamtus              411
ransompaid               623
ransompaidus             402
ransomnote               421
hostkidoutcome          8685
hostkidoutcome_txt    156772
nreleased               8095
addnotes               21924
scite1                 90442
scite2                 61161
scite3                 34132
dbsource              156772
INT_LOG               156772
INT_IDEO              156772
INT_MISC              156772
INT_ANY               156772
related                20422
dtype: int64

In [176]:
#
list(dfwhole)


Out[176]:
[u'eventid',
 u'iyear',
 u'imonth',
 u'iday',
 u'approxdate',
 u'extended',
 u'resolution',
 u'country',
 u'country_txt',
 u'region',
 u'region_txt',
 u'provstate',
 u'city',
 u'latitude',
 u'longitude',
 u'specificity',
 u'vicinity',
 u'location',
 u'summary',
 u'crit1',
 u'crit2',
 u'crit3',
 u'doubtterr',
 u'alternative',
 u'alternative_txt',
 u'multiple',
 u'success',
 u'suicide',
 u'attacktype1',
 u'attacktype1_txt',
 u'attacktype2',
 u'attacktype2_txt',
 u'attacktype3',
 u'attacktype3_txt',
 u'targtype1',
 u'targtype1_txt',
 u'targsubtype1',
 u'targsubtype1_txt',
 u'corp1',
 u'target1',
 u'natlty1',
 u'natlty1_txt',
 u'targtype2',
 u'targtype2_txt',
 u'targsubtype2',
 u'targsubtype2_txt',
 u'corp2',
 u'target2',
 u'natlty2',
 u'natlty2_txt',
 u'targtype3',
 u'targtype3_txt',
 u'targsubtype3',
 u'targsubtype3_txt',
 u'corp3',
 u'target3',
 u'natlty3',
 u'natlty3_txt',
 u'gname',
 u'gsubname',
 u'gname2',
 u'gsubname2',
 u'gname3',
 u'ingroup',
 u'ingroup2',
 u'ingroup3',
 u'gsubname3',
 u'motive',
 u'guncertain1',
 u'guncertain2',
 u'guncertain3',
 u'nperps',
 u'nperpcap',
 u'claimed',
 u'claimmode',
 u'claimmode_txt',
 u'claim2',
 u'claimmode2',
 u'claimmode2_txt',
 u'claim3',
 u'claimmode3',
 u'claimmode3_txt',
 u'compclaim',
 u'weaptype1',
 u'weaptype1_txt',
 u'weapsubtype1',
 u'weapsubtype1_txt',
 u'weaptype2',
 u'weaptype2_txt',
 u'weapsubtype2',
 u'weapsubtype2_txt',
 u'weaptype3',
 u'weaptype3_txt',
 u'weapsubtype3',
 u'weapsubtype3_txt',
 u'weaptype4',
 u'weaptype4_txt',
 u'weapsubtype4',
 u'weapsubtype4_txt',
 u'weapdetail',
 u'nkill',
 u'nkillus',
 u'nkillter',
 u'nwound',
 u'nwoundus',
 u'nwoundte',
 u'property',
 u'propextent',
 u'propextent_txt',
 u'propvalue',
 u'propcomment',
 u'ishostkid',
 u'nhostkid',
 u'nhostkidus',
 u'nhours',
 u'ndays',
 u'divert',
 u'kidhijcountry',
 u'ransom',
 u'ransomamt',
 u'ransomamtus',
 u'ransompaid',
 u'ransompaidus',
 u'ransomnote',
 u'hostkidoutcome',
 u'hostkidoutcome_txt',
 u'nreleased',
 u'addnotes',
 u'scite1',
 u'scite2',
 u'scite3',
 u'dbsource',
 u'INT_LOG',
 u'INT_IDEO',
 u'INT_MISC',
 u'INT_ANY',
 u'related']

In [163]:
dfwhole['country_txt'].value_counts()
## thinking about looking at MENA and Asia region with 4 of the top 5 countries listed are in those regions.


Out[163]:
Iraq                              18770
Pakistan                          12768
India                              9940
Afghanistan                        9690
Colombia                           8077
Peru                               6085
Philippines                        5576
El Salvador                        5320
United Kingdom                     4992
Turkey                             3557
Thailand                           3338
Spain                              3239
Sri Lanka                          2982
Somalia                            2890
Nigeria                            2888
Algeria                            2720
United States                      2693
France                             2617
Yemen                              2598
Lebanon                            2413
Chile                              2334
Russia                             2104
Israel                             2085
Guatemala                          2050
West Bank and Gaza Strip           1990
Nicaragua                          1970
South Africa                       1957
Egypt                              1799
Libya                              1643
Ukraine                            1583
                                  ...  
Grenada                               5
Montenegro                            5
Gabon                                 4
Iceland                               4
Malawi                                4
Solomon Islands                       4
Barbados                              3
Dominica                              3
French Polynesia                      3
Gambia                                3
People's Republic of the Congo        3
Mauritius                             2
Turkmenistan                          2
South Yemen                           2
Vanuatu                               2
St. Kitts and Nevis                   2
Equatorial Guinea                     2
Seychelles                            2
International                         1
Vatican City                          1
Antigua and Barbuda                   1
Andorra                               1
North Korea                           1
New Hebrides                          1
South Vietnam                         1
Wallis and Futuna                     1
Falkland Islands                      1
Brunei                                1
Gibraltar                             1
St. Lucia                             1
Name: country_txt, dtype: int64

In [109]:
dfwhole.head(2)


Out[109]:
eventid iyear imonth iday approxdate extended resolution country country_txt region ... addnotes scite1 scite2 scite3 dbsource INT_LOG INT_IDEO INT_MISC INT_ANY related
0 197000000001 1970 0 0 NaN 0 NaN 58 Dominican Republic 2 ... NaN NaN NaN NaN PGIS 0 0 0 0 NaN
1 197000000002 1970 0 0 NaN 0 NaN 130 Mexico 1 ... NaN NaN NaN NaN PGIS 0 1 1 1 NaN

2 rows × 137 columns

Setting up my prior which are bombings/explosions in the South Asia region from 1970 up to and including 2000. I will use Pakistan and Afghanistan and bombings/explosions from 2001 till 2015 to measure the amount of difference of these types of attacks.


In [73]:
df_prior = dfwhole[(dfwhole.attacktype1 == 3) & (dfwhole.region == 6) & (dfwhole.iyear <= 2000)].
groupby(['iyear', 'country_txt']).attacktype1.count().values

In [71]:
## Confirming it worked.
df_prior


Out[71]:
array([  2,   1,   1,   1,   2,   1,   1,   1,   3,   2,   5,   1,   9,
         1,   1,   4,   2,   2,  26,   8,   1,  66,   2,  48,   1,  16,
         9,   2,  46,   7,   9,  20,  85,   4,  23,  48,  45,  11,  16,
        72,  35, 102,   9,  16, 103,  32,  98,   2,   4, 115,  43,  15,
        23,   7,  85,  59,  34,  13,   6,  57,  28,  27,   4,  51,  42,
        20,  18,   2,  28,  61,  34,  50,   2,  42, 108,  31,  68,   1,
         2, 109,   2,  15,  28,   1,  33,  24,  26,   7,   9,  57,   1,
        19,  35,  12,   9,  84,  13,  38,  46])

In [77]:
df_prior_mean = df_prior.mean()
print df_prior_mean


26.7676767677

In [65]:
df_prior.groupby('country_txt').attacktype1.count()


Out[65]:
country_txt
Afghanistan      88
Bangladesh      204
India          1089
Nepal            25
Pakistan        469
Sri Lanka       775
Name: attacktype1, dtype: int64

In [117]:
dfwhole.groupby(['iyear', 'country_txt']).attacktype1.count()


Out[117]:
iyear  country_txt             
1970   Argentina                     21
       Australia                      1
       Belgium                        1
       Bolivia                        1
       Brazil                         6
       Canada                         2
       Colombia                       1
       Dominican Republic             2
       East Germany (GDR)            12
       Egypt                          1
       Ethiopia                       3
       Greece                         3
       Guatemala                      4
       Iran                           5
       Ireland                        1
       Israel                         1
       Italy                          3
       Japan                          2
       Jordan                         9
       Lebanon                        1
       Mexico                         2
       Netherlands                    2
       Nicaragua                      1
       Pakistan                       1
       Paraguay                       1
       Philippines                   10
       Spain                          4
       Switzerland                    3
       Turkey                        12
       United Kingdom                12
                                   ... 
2015   Pakistan                    1235
       Paraguay                      19
       Peru                          10
       Philippines                  717
       Qatar                          1
       Russia                        21
       Saudi Arabia                 103
       Senegal                        2
       Somalia                      407
       South Africa                   2
       South Korea                    1
       South Sudan                   54
       Sri Lanka                     11
       Sudan                        158
       Sweden                        36
       Syria                        485
       Tajikistan                     3
       Tanzania                      14
       Thailand                     277
       Trinidad and Tobago            1
       Tunisia                       17
       Turkey                       416
       Uganda                        10
       Ukraine                      637
       United Kingdom               115
       United States                 38
       Uzbekistan                     1
       Venezuela                      3
       West Bank and Gaza Strip     247
       Yemen                        668
Name: attacktype1, dtype: int64

Setting up prior std on my df_prior


In [78]:
df_prior_std = df_prior.std()
print df_prior_std


29.595190806

In [92]:
mean_prior_mean = df_prior_mean
mean_prior_std = df_prior_std

Setting up Afghanistan to be one of my populations from 2001 through 2015.


In [82]:
afp = dfwhole[(dfwhole.attacktype1 == 3) & (dfwhole.country == 4) & (dfwhole.iyear > 2000)].
groupby(['iyear']).attacktype1.count().values

In [84]:
afp


Out[84]:
array([  10,   32,   69,   39,   64,  151,  183,  187,  275,  264,  236,
        887,  827, 1015,  737])

Getting mean for Afghanistan.


In [86]:
afp_mean = afp.mean()
print afp_mean


331.733333333

Settinp up the standard deviation for Afghanistan.


In [87]:
afp_std = afp.std()
print afp_std


336.369334049

Setting up Pakistan to be my second population from 2001 through 2015.


In [88]:
pakp = dfwhole[(dfwhole.attacktype1 == 3) & (dfwhole.country == 153) & (dfwhole.iyear > 2000)].
groupby(['iyear']).attacktype1.count().values

Getting mean for Pakistan.


In [89]:
pakp_mean = pakp.mean()
print pakp_mean


421.266666667

Setting up the standard deviation for Pakistan.


In [90]:
## setting up std for Pakistan
pakp_std = pakp.std()
print pakp_std


428.359112065

In [95]:
with pm.Model() as model:
    
    groupPk_mean = pm.Normal('Bombings_Pak_mean', mean_prior_mean, sd=mean_prior_std)
    groupAfg_mean = pm.Normal('Bombings_Afg_mean', mean_prior_mean, sd=mean_prior_std)

In [96]:
std_prior_lower = 0.01
std_prior_upper = 100.0

with model:
    
    groupPak_std = pm.Uniform('Bombings_Pak_std', lower=std_prior_lower, upper=std_prior_upper)
    groupAfg_std = pm.Uniform('Bombings_Afg_std', lower=std_prior_lower, upper=std_prior_upper)

In [97]:
with model:

    groupPak = pm.Normal('Bombings_Pak', mu=groupPk_mean, sd=groupPak_std, observed=pakp)
    groupAfg = pm.Normal('Bombings_Afg', mu=groupAfg_mean, sd=groupAfg_std, observed=afp)

In [98]:
with model:

    diff_of_means = pm.Deterministic('difference of means',groupPk_mean - groupAfg_mean)
    diff_of_stds = pm.Deterministic('difference of stds',groupPak_std - groupAfg_std)
    effect_size = pm.Deterministic('effect size',
                                   diff_of_means / np.sqrt((groupPak_std**2 + groupAfg_std**2) / 2))

In [99]:
with model:
    trace = pm.sample(25000, njobs=4)


Auto-assigning NUTS sampler...
Initializing NUTS using advi...
Average ELBO = -540.16: 100%|██████████| 200000/200000 [00:20<00:00, 9809.53it/s] 
Finished [100%]: Average ELBO = -535.61
100%|██████████| 25000/25000 [02:34<00:00, 162.11it/s]

In [100]:
pm.plot_posterior(trace[3000:],
                  varnames=['Bombings_Pak_mean', 'Bombings_Afg_mean', 'Bombings_Pak_std', 'Bombings_Afg_std'],
                  color='#87ceeb')


Out[100]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x1492bf150>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x14b694650>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x149487e90>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x149a73ad0>], dtype=object)

In [101]:
pm.plot_posterior(trace[3000:],
                  varnames=['difference of means', 'difference of stds', 'effect size'],
                  ref_val=0,
                  color='#87ceeb')


Out[101]:
array([<matplotlib.axes._subplots.AxesSubplot object at 0x14a81bcd0>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x1427d3390>,
       <matplotlib.axes._subplots.AxesSubplot object at 0x13bcf2b10>], dtype=object)

In [102]:
pm.summary(trace[3000:],
           varnames=['difference of means', 'difference of stds', 'effect size'])


difference of means:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  50.745           27.456           0.082            [-3.243, 104.116]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  -3.105         32.311         50.779         69.292         104.272


difference of stds:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.215            0.614            0.002            [-0.947, 1.576]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  -0.860         -0.128         0.120          0.492          1.691


effect size:

  Mean             SD               MC Error         95% HPD interval
  -------------------------------------------------------------------
  
  0.510            0.276            0.001            [-0.031, 1.047]

  Posterior quantiles:
  2.5            25             50             75             97.5
  |--------------|==============|==============|--------------|
  
  -0.031         0.325          0.510          0.696          1.047


In [53]:
afgtd = dfwhole[dfwhole.country_txt=='Afghanistan']

Attacks in Afghanistan by Year


In [59]:
plt.hist(afgtd['iyear'])


Out[59]:
(array([  1.00000000e+00,   3.00000000e+00,   0.00000000e+00,
          2.20000000e+01,   6.80000000e+01,   2.10000000e+01,
          7.50000000e+01,   6.25000000e+02,   1.79800000e+03,
          7.07700000e+03]),
 array([ 1973. ,  1977.2,  1981.4,  1985.6,  1989.8,  1994. ,  1998.2,
         2002.4,  2006.6,  2010.8,  2015. ]),
 <a list of 10 Patch objects>)

In [55]:
pkgtd = dfwhole[dfwhole.country_txt=='Pakistan']

In [56]:
pkgtd.tail(2)


Out[56]:
eventid iyear imonth iday approxdate extended resolution country country_txt region ... addnotes scite1 scite2 scite3 dbsource INT_LOG INT_IDEO INT_MISC INT_ANY related
156765 201512310030 2015 12 31 NaN 0 NaN 153 Pakistan 6 ... NaN "Unknown armed men fire 3 rockets in Mand and ... "Highlights: Pakistan Balochistan Press 1 Janu... NaN START Primary Collection 0 0 0 0 NaN
156766 201512310031 2015 12 31 NaN 0 NaN 153 Pakistan 6 ... NaN "Unknown armed men fire 3 rockets in Mand and ... "Highlights: Pakistan Balochistan Press 1 Janu... NaN START Primary Collection 0 0 0 0 NaN

2 rows × 137 columns

Looking at attack types in Pakistan


In [66]:
pkgtd['attacktype1_txt'].value_counts().head(5)


Out[66]:
Bombing/Explosion                 6788
Armed Assault                     3378
Assassination                     1256
Hostage Taking (Kidnapping)        727
Facility/Infrastructure Attack     286
Name: attacktype1_txt, dtype: int64

Attacks in Pakistan by Year


In [58]:
plt.hist(pkgtd['iyear'])


Out[58]:
(array([  3.00000000e+00,   7.00000000e+00,   2.50000000e+01,
          8.90000000e+01,   4.11000000e+02,   1.00000000e+03,
          3.84000000e+02,   2.20000000e+02,   2.37100000e+03,
          8.25800000e+03]),
 array([ 1970. ,  1974.5,  1979. ,  1983.5,  1988. ,  1992.5,  1997. ,
         2001.5,  2006. ,  2010.5,  2015. ]),
 <a list of 10 Patch objects>)

Methodology and Analysis Global Terrorism Database

Methodology:

My Bayesian approach was to create a prior with all bombings/explosions from 1970 through 2000 in South Asia; using Pakistan and Afghanistan as my populations. For my two countries, I used bombings/explosions from 2001 through 2015. Through EDA, I saw number of these types of attacks were similar for both country; spiking in the last years of the dataset (2010-2015). These attacks placed Pakistan second (with 12,768) in overall bombings/explosions and Afghanistan (with 9690) fourth. Three of the top 5 countries are classified as South Asia in the dataset. Even though India is third (with 9940) I believed the connection that Afghanistan and Pakistan may or may not have some estimation population difference. Given the model results I seek alternative analysis. The means of both populations was high and closer together which was not showing how different they are. My summaries of the posterior distributions of the parameters were not statistically significant. The difference between Pakistan and Afghanistan, given the prior I created, was insignificant. If run again, India would be considered as an option with Pakistan. Another option is to use Iraq and Afghanistan as the parameters to see if the Bayesian approach is more successful.

Predicting the 1993 bombings/explosions:

I chose 3 years before 1993 (1990-1992) and 3 years after (1994-1996). I calculated the mean of the six year’s bombings/explosions numbers. It was 1,436. I also took the number of bombings/explosions overall years (75,963) and divided it by the number of years (44) in the dataset (excluding 1993). This result was 1,726. I would say an average of these may be overfitting to this dataset. Three years before and after allows enough data to garner a healthy mean to use for 1993. The overall mean of 1726 gives a safe range for the data. The best estimate for the number of bombings/explosions for 1993 is 1,436.